Compiler-Assisted Dynamic Predicated Execution of Complex Control-Flow Structures
نویسندگان
چکیده
Even after decades of research in branch prediction, branch predictors still remain imperfect, which results in significant performance loss in aggressive processors that support large instruction windows and deep pipelines. This paper proposes a new processor architecture for handling hard-to-predict branches, the diverge-merge processor. The goal of this paradigm is to eliminate branch mispredictions due to hard-to-predict dynamic branches by dynamically predicating them. To achieve this without incurring large hardware cost and complexity, the compiler identifies branches that are suitable for dynamic predication called diverge branches. The compiler also selects a control-flow merge (or reconvergence) point corresponding to each diverge branch to aid dynamic predication. If a diverge branch is hard-to-predict at run-time, the microarchitecture dynamically predicates the instructions between the diverge branch and the corresponding merge point by first executing one path after the branch, then executing the other path, and later merging the data-flow produced by the two paths using special select-uop instructions. The control-flow merge point is selected based on the frequently-executed paths in the program using profile information. Therefore, the control-flow from a diverge branch does not have to merge (but it usually does), which allows the dynamic predication of a much larger set of branches than simple hammock (if-else) branches . Our evaluations show that a diverge-merge processor outperforms a baseline with an aggressive branch predictor by 10.8% on average over 15 SPEC CPU2000 benchmarks, through an average reduction of 31% in pipeline flushes due to branch mispredictions. Furthermore, the proposed mechanism outperforms a previously-proposed dynamic predication mechanism that can predicate only simple hammock branches by 7.8%.
منابع مشابه
Predicate-Based Transformations to Eliminate Control and Data-Irrelevant Cache Misses
The performance of modern processors is increasingly dependent on their ability to execute multiple instructions per cycle. Explicitly Parallel Instruction Computing (EPIC) architectures can achieve high performance by using the compiler to express program instruction level parallelism (ILP) directly to the hardware. The predicated execution feature is critical to the success of the EPIC archit...
متن کاملSpeculative pre-execution assisted by compiler (SPEAR)
Speculative pre-execution achieves efficient data prefetching by running additional prefetching threads on spare hardware contexts. Various implementations for speculative pre-execution have been proposed, including compiler-based static approaches and hardware-based dynamic approaches. A static approach defines the p-thread at compile time and executes it as a stand-alone running thread. There...
متن کاملA Comparison of Full and Partial Predicated Execution
One can eeectively utilize predicated execution to improve branch handling in instruction-level parallel processors. Although the potential beneets of predicated execution are high, the tradeoos involved in the design of an instruction set to support predicated execution can be diicult. On one end of the design spectrum, architectural support for full pred-icated execution requires increasing t...
متن کاملSupport for Software Assisted Speculative Execution
Computer architects strive to improve machine performance by exploiting parallelism, but control flow and data dependences limit available parallelism. Speculative execution enhances parallelism by selectively ignoring the constraints of control flow and data dependences, thereby executing instructions before it it known whether they are needed or correct. Software assisted speculative executio...
متن کاملISCA - 22 , Jun 1995 1 A Comparison of Full and Partial Predicated Execution Supportfor ILP
One can eeectively utilize predicated execution to improve branch handling in instruction-level parallel processors. Although the potential beneets of predicated execution are high, the tradeoos involved in the design of an instruction set to support predicated execution can be diicult. On one end of the design spectrum, architectural support for full pred-icated execution requires increasing t...
متن کامل